ProGreSS: Simultaneous Searching of Protein Databases by Sequence and Structure
نویسندگان
چکیده
We consider the problem of similarity searches on protein databases based on both sequence and structure information simultaneously. Our program extracts feature vectors from both the sequence and structure components of the proteins. These feature vectors are then combined and indexed using a novel multi-dimensional index structure. For a given query, we employ this index structure to find candidate matches from the database. We develop a new method for computing the statistical significance of these candidates. The candidates with high significance are then aligned to the query protein using the Smith-Waterman technique to find the optimal alignment. The experimental results show that our method can classify up to 97% of the superfamilies and up to 100% of the classes correctly according to the SCOP classification. Our method is up to 37 times faster than CTSS, a recent structure search technique, combined with Smith-Waterman technique for sequences.
منابع مشابه
Title: ProGreSS: SIMULTANEOUS SEARCHING OF PROTEIN DATABASES BY SEQUENCE AND STRUCTURE PSB Session: Joint Learning from Multiple Types of Genomic Data Authors: ARNAB BHATTACHARYA, TOLGA CAN, TAMER KAHVECI, AMBUJ
We consider the problem of similarity searches on protein databases based on both sequence and structure information simultaneously. Our program extracts feature vectors from both the sequence and structure components of the proteins. These feature vectors are then combined and indexed using a novel multi-dimensional index structure. For a given query, we employ this index structure to find can...
متن کاملMolecular characterization of apolipoprotein A-I from the skin mucosa of Cyprinus carpio
Apolipoprotein A-I is the most abundant protein in Cyprinus carpio plasma that plays an important role in lipid transport and protection of the skin by means of its antimicrobial activity. A 527 bp cDNA fragment encoding C terminus part of apoA-I from the skin mucosa of common carp was isolated using RT-PCR. After GenBank database searching, a partial sequence containing a coding sequence (CDS)...
متن کاملProtein Databases
Proteins are sources of many peptides with diverse biological activity. Some of them are considered as valuable components of foods and drug targets with desired and designed biological activity. We are now entering an era rich in biological data in which the field of bioinformatics is poised to exploit this information in increasingly powerful ways. There are currently many databases all over ...
متن کاملSCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs
Establishment of similarities between proteins is very important for the study of the relationship between sequence, structure and function and for the analysis of evolutionary relationships. Motif-based search methods play a crucial role in establishing the connections between proteins that are particularly useful for distant relationships. This paper reports SCANMOT, a web-based server that s...
متن کاملMolecular characterization of apolipoprotein A-I from the skin mucosa of Cyprinus carpio
Apolipoprotein A-I is the most abundant protein in Cyprinus carpio plasma that plays an important role in lipid transport and protection of the skin by means of its antimicrobial activity. A 527 bp cDNA fragment encoding C terminus part of apoA-I from the skin mucosa of common carp was isolated using RT-PCR. After GenBank database searching, a partial sequence containing a coding sequence (CDS)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2004